Area and time efficient implementations of matrix multiplication on FPGAs
نویسندگان
چکیده
We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in [7] and [1] in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the designs in [7], [1], and our design are 14.45, 4.93, and 2.35, respectively, for 4 × 4 matrix multiplication. The latency of the design in [7] is 0.57μs, while our design takes 0.15μs using 18% less area. The area of our designs is smaller by 11%−46% compared with the best known systolic designs based on [9] with the same latency for the matrices of sizes 3×3−12×12. The performance improvements tend to grow with the problem size.
منابع مشابه
Energy Performance of Floating-Point Matrix Multiplication on FPGAs
Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGA-based floating-point matrix multiplication considers the optimization of latency or area only. In this paper, we analyze the impact of various parame...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملArea-Time-Power of Modular Multipliers implemented in FPGA
Three modular multiplication algorithms are described and compared: the so-called Multiply and Reduce, the Shift and Add, and finally, the Montgomery product. An estimation of the cost of their combinational implementation using Xilinx FPGAs family is calculated. Practical results in term of area, delay, and power for both combinational and completely sequential implementations are presented.
متن کاملArea/performance trade-off analysis of an FPGA digit-serial GFð2Þ Montgomery multiplier based on LFSR
Montgomery Multiplication is a common and important algorithm for improving the efficiency of public key cryptographic algorithms, like RSA and Elliptic Curve Cryptography (ECC). A natural choice for implementing this time consuming multiplication defined on finite fields, mainly over GFð2Þ, is the use of Field Programmable Gate Arrays (FPGAs) for being reconfigurable, flexible and physically s...
متن کاملA Model-Based Methodology for Application Specific Energy Efficient Data Path Design Using FPGAs
We present a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002